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Abstract 

The traditional information theoretic approach to studying feedback is to consider ideal instantaneous high-rate 
feedback of the channel outputs to the encoder. This was acceptable in classical work because the results were 
negative: Shannon pointed out that even perfect feedback often does not improve capacity and in the context of 
symmetric DMCs, Dobrushin showed that it does not improve the fixed block-coding error exponents in the inter- 
esting high rate regime. However, it has recently been shown that perfect feedback does allow great improvements 
in the asymptotic tradeoff between end-to-end delay and probability of error, even for symmetric channels at high 
rate. Since gains are claimed with ideal instantaneous feedback, it is natural to wonder whether these improvements 
remain if the feedback is unreliable or otherwise limited. 

Here, packet-erasure channels are considered on both the forward and feedback links. First, the feedback 
channel is considered as a given and a strategy is given to balance forward and feedback error correction in the 
suitable information-theoretic limit of long end-to-end delays. At high enough rates, perfect-feedback performance 
is asymptotically attainable despite having only unreliable feedback! Second, the results are interpreted in the zero- 
sum case of "half-duplex" nodes where the allocation of bandwidth or time to the feedback channel comes at the 
direct expense of the forward channel. It turns out that even here, feedback is worthwhile since dramatically lower 
asymptotic delays are possible by appropriately balancing forward and feedback error correction. 

The results easily generalize to channels with strictly positive zero-undeclared-error capacities. 



Wireless Foundations, Department of Electrical Engineering and Computer Science at the University of California at Berkeley. A preliminary 
version of these results was presented at the 2007 ITA Workshop in San Diego. 
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Balancing forward and feedback error correction for 
erasure channels with unreliable feedback 

I. Introduction 

The three most fundamental parameters when it comes to reliable data transport are probability of error, end- 
to-end system delay, and data rate. Error probability is critical because a low probability of bit error lies at the 
heart of the digital revolution justified by the source/channel separation theorem. Delay is important because it is 
the most basic cost that a system must pay in exchange for reliability — it allows the laws of large numbers to be 
harnessed to smooth out the variability introduced by random communication channels. 

Data rate, while apparently the most self-evidently important of the three, also brings out the seemingly pedantic 
question of units: do we mean bits per second or bits per channel use? In a point-to-point setting without feedback, 
there is an unambiguous mapping between the two of them given by the number of channel uses per second. When 
feedback is allowed, an ambiguity arises: how should the feedback channel uses be accounted for? Are they "free" 
or do they need to be counted? 

Traditionally, the approach has been to consider feedback as "free" because the classical results showed that in 
many cases even free feedback does not improve the capacity [1], and in the fixed-length block-coding context, 
does not even improve the tradeoff between the probability of error and delay in the high-rate regime for symmetric 
channels [2], [3]. (See [4] for a detailed review of this literature.) Thus, the answer was simple: if feedback comes 
at any cost, it is not worth using for memoryless channels. 

In [4], we recently showed that perfect feedback is indeed quite useful in a "streaming" context if we are willing 
to use non-block codes to implement our communication system. In the natural setting of a message stream being 
produced at a fixed (deterministic) rate of R bits per second, feedback does provideQ a tremendous advantage 
in terms of the tradeoff between end-to-end delay and the probability of bit error. As the desired probability of 
error tends to zero, feedback reduces the end-to-end delay by a factor that approaches infinity as the desired rate 
approaches capacity. In [4], the resulting fixed-delay error exponents with feedback are calculated exactly for erasure 
channels and channels with strictly positive zero-error capacity. For general DMCs, [4] gives a general upper bound 
(the uncertainty-focusing bound) along with a suboptimal construction that substantially outperforms codes without 
feedback at high rates. 

Once it is known that perfect feedback is very useful, it is natural to ask whether this advantage persists if 
feedback is costly, rate-limited, or unreliable in some way. After all, the real question is not whether perfect 
feedback would be useful but how imperfect feedback is worth designing in real communication systems. This has 
long been recognized as the Achilles' Heel for the information-theoretic study of feedback. Bob Lucky in [6] stated 
it dramatically: 

Feedback communications was an area of intense activity in 1968 A number of authors had shown construc- 
tive, even simple, schemes using noiseless feedback to achieve Shannon-like behavior. .. The situation in 1973 is 
dramatically different.. . . The subject itself seems to be a burned out case. . . . 

In extending the simple noiseless feedback model to allow for more realistic situations, such as noisy feedback 
channels, bandlimited channels, and peak power constraints, theorists discovered a certain "brittleness" or sensitivity 
in their previous results. 

The literature on imperfect feedback in the context of memoryless channels is relatively thin. Schulman and others 
have studied interaction over noisy channels in the context of distributed computation [7], [8]. The computational 
agents can only communicate with each other through noisy channels in both directions. In Schulman's formulation, 
neither capacity nor end-to-end delay is a question of major interest since constant factor slowdowns are seemingly 
unavoidable. The focus is instead on making sure that the computation succeeds and that the slowdown factor does 
not scale with problem size (as it would for a purely repetition based strategy). 

On the reliability side before recently, all the limited successes were for continuous-alphabet AWGN type-channels 
following the Schalkwijk/Kailath model from [9], [10]. Kashyap in [11] introduced a scheme that tolerated noise on 
the feedback link, but it used asymptotically infinite (in the block-length) power on the feedback link to overcome 



'This overturned Pinsker's incorrect assertion in Theorem 8 of [5] that feedback gives no asymptotic advantage in this nonblock setting. 
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it. It is only the work by Kramer [12] and Lavenburg [13], [14] that worked with finite average power in both 
directions. But these were all cases in which the average nature of the power-constraint played an important role. 
Recently, the AWGN story with noisy feedback has also attracted the interest of Weissman, Lapidoth and Kim 
in [15], who rigorously proved a strongly negative folk result for the case of uncoded channel-output feedback 
corrupted by arbitrarily low levels of Gaussian noise. At the same time, [16] showed that if the feedback noise has 
bounded support, then techniques similar to those of [17] could preserve reliability gains, but only at the price of 
having to back away from the capacity of the forward link. 

For finite alphabets, we have recently had some success in showing robustness to imperfect feedback in the "soft 
deadlines" context where the decoder is implicitly allowed to postpone making a decision, as long as it does not 
do so too often. With perfect feedback, this has traditionally been considered in the variable-block-length setting 
where Burnashev's bound of [18] gives the ultimate limit with perfect feedback and Yamamoto-Itoh's scheme 
of [19] provides the baseline architecture. [20] showed that if the feedback channel was noisy, but of very high 
quality, then the loss relative to the Burnashev bound could be quite small by appropriately using anytime codes 
and pipelining. [21] allowed bursty noiseless feedback, but constrained its overall rate to show that by using hashing 
ideas, something less than full channel output feedback could be used while still attaining the Burnashev bound. 
The ideas of [20], [21] were combined in [22] to show that it was possible to get reliability within a factor of two of 
the Burnashev bound as long as the noisy feedback channel's capacity was higher than the capacity of the forward 
link and this was further tightened in [23] to a factor that approaches one as the target rate approached capacity. 
This story culminates in [24] where it is shown that from the system-level perspective, Burnashev's bound is not 
the relevant target. Instead, Kudryashov's performance with noiseless feedback in [25] (better than the Burnashev 
Bound) can in fact be asymptotically attained robustly as long as the feedback channel's capacity is larger than the 
target reliability. 

The focus here is on the problem of fixed rate and fixed end-to-end delay in the style of [4] where the decoder is 
not allowed to postpone making a decision. This paper restricts attention to the case of memoryless packet-erasure 
channels where the feedback path is also an unreliable packet-erasure channel. Recently, Massey [26] has had 
some interesting thoughts on this problem, but he claims no asymptotic reliability gains for uncertain feedback 
over no feedback. The issue of balancing forward and feedback error correction has also attracted attention in the 
networking community (see eg [27]), but the focus there is not purely on reliability or delay but is mixed with the 
issue of adapting to bursty channel variation as well as fairness with other streams. 

Section [TT] establishes the notation and states the main results, with the proofs following in subsequent sections. 
The results are stated for the concrete case of packet-erasure channels. Section [II] also plots the performance for 
some examples and compares the results with the baseline approaches of only using forward error correction and 
just using feedback for requesting retransmissions at the individual packet level. 

In Section JIIJ the feedback channel uses are considered "free" in that they do not compete with forward 
channel uses for access to the underlying communication medium. Adapting arguments from [4], it is shown 
that asymptotically, perfect feedback performance can be attained even with unreliable feedback. Because the 
uncertainty-focusing bound of [4] is met at rates that are high enough, it is known that this is essentially optimal. 
If the target rate is too low, then the dominant error event for the scheme turns out to be the feedback channel 
going into complete outage and erasing every packet. 

At the end of Section JIIJ it is noted that the main result generalizes naturally from packet-erasure channels to 
DMCs whose zero-undetected-error capacity is equal to their Shannon capacity. As a bonus^, the same techniques 
give rise to a generalization of Theorem 3.4 of [4] and show the achievability with perfect feedback of the symmetric 
uncertainty-focusing bound at high rate for any channel whose probability matrix contains a nontrivial zero. 

Section [TV] reinterprets the results of the previous section to address the question of "costly" feedback in that both 
the data rate and delay are measured not relative to forward channel uses, but relative to the sum of feedback and 
forward channel uses. This models the case when the feedback channel uses the same underlying communication 
resource as the forward channel in a zero-sum way (e.g. time-division or frequency-division in wireless networks). 
It is shown that there is a tremendous advantage to using some of the channel uses for feedback. 

2 Dear Reviewers: the result referred to here does not really fit in with the overall theme of unreliable feedback, but is placed in this paper 
since the techniques used are common. 
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Fig. 1. The problem of communication with unreliable feedback. For the most part in this paper, both of the unreliable channels Pf and 
Pb are packet erasure channels that are independent of each other. 
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Fig. 2. Feedback and forward channel uses. The forward channel has packet-erasure probability j3f while the feedback channel has 
packet-erasure probability [3b- In this picture kf = 6, kb = 1 So a fraction j of the total channel uses are allocated to feedback. 



II. Definitions and main results 
The problem and notation is illustrated in Figures [TJ and [2] Formally: 

Definition 2.1: A C p -bit ^-erasure channel refers to a discrete memoryless channel that accepts packets X(t) 
consisting of C p bits (thought of as integers from to 2 Cp — 1 or strings from {0, l} Cp ) per packet as inputs and 
either delivers the entire packet perfectly Y(t) = X(t) with probability 1 — (3 or erases the whole packet Y(t) = —1 
with probability (3. 

Definition 2.2: The [kf, kb, Cf, Cb, /3/, A>) problem consists of a system in which one cycle of interaction between 
encoders and decoders consists of kf independent packets being sent along a C/-bit /3y-erasure channel along the 
forward direction and kb independent packets being sent along a CVbit /^-erasure channel along the reverse 
direction. 

A feedback encoder 8b for this problem is a sequence of maps S^t- The range of each map is kb packets 
(Xb(kbt + l),Xb(kbt + 2),... ,Xb(kbt + kb)) consisting of Cb bits each. The £-th map takes as input all the 
available forward channel outputs (1/(1), . . . ,Yf{kft)) so far. 

A rate R forward encoder Sf for this problem is also a sequence of maps The range of each map is kf 
packets (Xf(kft + 1), Xf(kft + 2), . . . , Xf(kft + kf)) consisting of Cf bits each. The t-th map takes as input all 
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so far. 



the available feedback channel outputs (Yb(l), . . . , Yb(kbt)) as well as the message bits B[ 

The rate R above is in terms of forward channel uses only and is normalized in units of Cj-bit packets. The rate R' 
in terms of overall channel uses is R' = R , k f, . The rate R in terms of weighted channel uses is R = R , ^ fC f r , . 

kf+k b b kfCf+kbCb 

A delay d rate R decoder is a sequence of maps Pj. The range of each map is an estimate Bi for the i-th bit 
taken from {0, 1}. The i-th map takes as input the available channel outputs (Yy(l), Yf(2), . . . , Yf(\ Rk l c ] +dkf)). 
This means that it can see dk f channel uses beyond when the bit to be estimated first had the potential to influence 
the channel inputs. 

Just as rate can be expressed in different units, so can delay. The delay d above is in terms of forward channel 



uses only. The delay d' in terms of overall channel uses is d' 
uses is d 



d 



kj+k b 



The delay d in terms of weighted channel 



U k f C f ■ 

All encoders and decoders are assumed to be randomized and have access to infinite amounts of common 
randomness that is independent of both the messages as well as the channel unreliability. 

The notation above captures the real flexibility of interest. The distinction between C f and Cb allows for forward 
and feedback packets to be of different size. The distinction between kf and kb summarizes the relative width of 
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the forward and feedback pipes. Similarly, the distinction between j3t and (5b allows for the channel to be more or 
less reliable in the different directions. 

The three kinds of units correspond to three different ways of thinking about the problem. 

• The unadorned R and d take the traditional approach of considering the feedback to be entirely free, although 
there is now a limited amount of it and it is imperfect. 

• The R' and d! consider feedback to be costly, but in terms of channel uses only. The relative size of the packets 
is considered unimportant. It is clear that as long as some feedback is present, R' < R and d' > d. These 
metrics give an incentive to use less feedback if possible. 

• The R and d also consider the relative size of the packets significant. These metrics give an incentive to use 
shorter feedback packets. 

It is even interesting to consider combinations of metrics. The combination of R and d' is particularly interesting 
since it corresponds to the case when feedback is implemented as bits stolen from message-carrying packets coming 
in the reverse direction. Using R gives an incentive to make the number of bits taken small, but using d' captures 
the fact that the delay in real time units includes the full lengths of both the intervening forward and feedback 
packets. 

Definition 2.3: The fixed-delay error exponent a is asymptotically achievable at message rate R across a noisy 
channel if for every delay dj in some increasing sequence dj — > oo there exist rate R encoders (Sj , £ 3 b ) and delay 
dj decoders V 3 that satisfy the following properties when used with input bits Bi drawn from iid fair coin tosses. 

1) For the j-th code, there exists an €j < 1 so that V(B{ ^ Biidjf) < ej for every bit position i > 1. The Bi(dj) 
represents the delay dj estimate of Bi produced by the (8l , £ b , V 3 ) triple connected through the channels in 
question. 

2) liro^oo > a. 

The exponent a is asymptotically achievable universally over delay or in an anytime fashion if a single encoder 
pair (£j,£ b ) can be used simultaneously for all sufficiently long dj. Only the decoders V 3 have to change with 
the target delay dj. 

The error exponents a' and a are defined analogously but use the cf and dj versions of delay. 



A. Main Results 

Theorem 2.1: Given the (kf,kb,Cf,Cb,Pf,l3b) problem with kf,kb > 1, forward packet size Cf > 1, and 
feedback packet size Cf, > 1, it is possible to asymptotically achieve all fixed-delay reliabilities 

a<min(-^lnp b ,E (C f ,l)) (1) 

where the Gallager function for the forward channel is 

E (C f , p) = - \n{(3 } + 2-^(1 -p f )), (2) 
as long as the rate R in normalized Cf units satisfies 



» + ln ( l-exp(^ / ) 

Furthermore, these fixed-delay reliabilities are obtained in an anytime fashion. 
Proof: See Section ITTTA] 

The uncertainty-focusing bound from [4] for this problem assuming perfect feedback is easily calculated to be 
given by ([3]) but it holds for all < a < — In (3j. Since lower reliabilities are associated with higher rates, this shows 
that the result of Theorem 12. II is asymptotically optimal at high enough rates. The feedback packet size needs to be 
only one bit and there is similarly no restriction on the size of forward packets. Since lim^^oo Eq(Cj, 1) = — In j3f, 
the sense of high enough rates given by (Q]) depends only on the relative frequency and reliability of the feedback 
link as the forward packet size tends to infinity. 

When the packet sizes are at least two bits long in both directions, asymptotic delay performance can be slightly 
improved at low rates. 
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Theorem 2.2: Given the (kf,kb,Cf,Cb,f3f,/3b) problem with kf,kb > 1, Ct > 2, and feedback packet size 
Cb > 2, it is possible to asymptotically achieve all fixed-delay reliabilities 

a<-^ln(3 b (4) 
fe / 

as long as the rate R in normalized Cf units satisfies 

*<(W Air-' 

V C f J a + In ' 



l-cxp(«)/3 / 



Furthermore, these fixed-delay reliabilities are obtained in an anytime fashion. 
Proof: See Section UTTBl 

The upper-limit on reliability for the scheme given by (0]) corresponds to the event that the feedback channel 
erases every feedback packet for the entire duration of d cycles. If kf and kb could be chosen by the system 
designer, this constraint could be made non-binding simply by choosing ^ large enough. However, if there is such 
flexibility, it is only fair to also penalize based on the total resources used, rather than only penalizing forward 
channel uses. 

Theorem 2.3: Given only (Cf > 1, C& > l,/3/ > 0, fib > 0), the kf > and kb > can be chosen to 
asymptotically achieve all (R', a') pairs that are contained within the parametric region: 

a' < E' (C f ,p), 

- < E -m 

where 

E'o(C f ,p) = ((-InPf^ + iEoiCf^))' 1 )' 1 (7) 

and the Gallager function Eo(Cf,p) is defined in © and p ranges from to 1. 
If furthermore Cf > 2, Cb > 2, then the following region is also attainable: 

a' < E' (C f -l,p), 
E' (C f -l,p) 

R < pC f \n2 (8) 

with p ranging from to oo. 

If Cb stays constant while Cf can be chosen as large as desired, then (R, a') in 

a' < E' (C f ,p), 

can be achieved where p € [0, 1]. If the (R,a) tradeoff is desired, use © for R with a < Eo(Cf,p). 

All of these fixed-delay reliabilities are obtained in an anytime fashion. 
Proof: See Section |TVl 



B. Pure strategies and comparison plots 

To understand the implications of Theorems 12- 1 1 12.21 and 12.31 it is useful to compare them to what would be 
obtained using strategies for reliable communication that use only feedback error correction or only forward error 
correction. 
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1 ) Pure forward error correction: The simplest approach is to ignore the feedback and just use a fixed-delay 
code. [28], [29] reveal that infinite-constraint-length convolutional codes achieve the random-coding error exponent 
E r (R) with respect to end-to-end delay and [4] tells us that we cannot do any better than the sphere-packing bound 
without feedback. 

EJR) = max E (C f ,p)- pRCf (10) 
pe[Q,i] 

E sp {R) = max E (C f , p) - pRC f (11) 

pe[o,oo) 

where R ranges from to the forward capacity of 1 — /3f packets (of size Cf each) per channel use. 

Using a fixed-length block code would introduce another factor of two in end-to-end delay since the message to 
be transmitted would first have to be buffered up to make a block. 

2) Pure feedback error correction: The intuitive "repeat until success" strategy for perfect feedback analyzed in 
[4] can be adapted to when the feedback is unreliable. For simplicity, focus on kf = kb = 1. The idea is for the 
feedback encoder to use 1 bit on the feedback channel to indicate if the forward packet was received or not. If this 
feedback packet is not erased, the situation is exactly as it is when feedback is perfect. When the feedback packet 
is erased, the safe choice for the forward encoder is to retransmit. However, this requires some way for the decoder 
to know that the incoming packet is a retransmission rather than a new packet. The practical way this problem is 
solved is by having sequence numbers on packets. As [26] points out, only 1 bit of overhead per forward packet 
is required for the sequence number in this scenario. 

Thus, the resulting system behaves like a repeat until success system with perfect feedback with two modifications: 

• The effective forward packet size goes from Cf bits to Cf — 1 bits to accommodate the 1-bit sequence numbers. 

• The effective erasure probability goes from (3f to (1 — (1 — — (3b)) because an erasure on either forward 
or feedback channel demands a retransmission. 

Thus an error exponent of a with respect to end-to-end delay can be achieved as long as the rate R (in units of 
Cf bits at a time) satisfies (using Theorem 3.3 in [4] and adjusting for a being in base e rather than base 2): 

R<( ~ J cr ) ^iu — (i-/*,xi-A) — y (12) 

and < a < — ln(l — /9/)(l — (3b)- No higher a can be achieved by this scheme. Even as a — ► and Cf — ► oo, 
the above rate only reaches (1 — /3/)(l — and thus is bounded away from the capacity 1-/3/ of the forward 
link. 

3) Comparison: Three scenarios are considered. Figure [3] sets kf = kb = 1 and compares the pure feedback 
and pure forward error correction strategies to the balanced approach of Theorem 12.11 when the erasure probability 
on both the forward and feedback links are the same. The limit of Cf — > oo is shown, although this is only 
significant at low rates. If the feedback link were more unreliable than the forward link, then the reliability gains 
from Theorem 12.11 would saturate at lower rates. Looking at the curves in the vicinity of capacity shows clearly 
that the factor reduction in asymptotically required end-to-end delay over purely forward error correction tends to 
oo. 

Figure [4] illustrates the difference between Theorems 12.11 and 12.21 For high rates, Theorem 12. II is better. But at low 
rates, Theorem |2.2| provides better reliability and hence shorter asymptotic end-to-end delays. When the packets are 
short, the capacity penalty for allocating 1 bit for a header can be significant as the plot illustrates using Cf = 4. 

Figure [5] illustrates the scenario of Theorem 12.31 in that it assumes that there is a single shared physical channel 
that must be divided between forward and feedback channel uses. Somewhat surprisingly, feedback becomes more 
valuable the closer the system comes to capacity. The factor reduction in asymptotically required end-to-end delay 
over purely forward error correction tends to oo as the data rate approaches capacity. This shows that, at least in 
the packet-erasure case, feedback is worth implementing even if it comes at the cost of taking resources away from 
the forward path. 

Finally, Figure [6] illustrates the impact of how rate and delay are counted. Notice that at high rates, the curve 
in which the feedback is counted against the delay but not the rate is very close to the case in which feedback is 
free. This makes sense when the packets are large and there is presumably independent data coming in the opposite 
direction. 
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Fig. 3. The error exponents governing the asymptotic tradeoff between the probability of error and end-to-end delay for an 0.25-erasure 
channel with a separate 0.25-erasure channel on the feedback link. The top curve is the uncertainty-focusing bound from [4] that is optimal 
assuming that feedback is perfect. If the probability of erasure on the reverse channel were increased to 0.50, the achievable exponents for 
the schemes of this paper saturate at In 0.5. For comparison, the simple feedback-only (with 0.25 erasure on both forward and feedback 
links) strategy achieves only the lower curve. The forward-only curve bounds what is possible without using feedback in general and also 
what is possible with feedback if the system is restricted to fixed-length block codes. 



III. Unreliable, but "free" feedback 

Throughout this section, the kf and kb are considered to be a given. The goal is to prove Theorems 12. II and 12.21 
The basic idea is to adapt the (n, c, I) scheme of [4] to this situation. Corollary 6.1 of [4] is the key tool used to 
prove the results. 

A. Theorem \2.1\ no list decoding 
The scheme used is: 

1) Group incoming bits into blocks of size nckfRCf each. Assume that n and c are both large, but nckf is 
small relative to the target end-to-end delay measured in forward channel uses. 

2) Hold blocks in a FIFO queue awaiting transmission. The first block is numbered 1 with numbers incrementing 
by 1 thereafter. At time 0, both sides agree that block has been decoded. The current pointer for both is 
set at 1. 

3) The forward encoder transmits the oldest waiting block using an oo-length random codebook (rateless code) 
with a new codebook being drawn for each block. The codewords themselves consist of iid uniform Cj-bit 
packets. 

Formally, the codewords are Xi(j,i) where i > represents the current block number, t > is the current 
time, and < j < 2 nckfRCf is the value of the current block. Each Xi(j,t) is drawn iid and is kf packets 
long. 

4) The forward decoder uses the received channel symbols Y(t) to eliminate potential messages (codewords) 
that could have been sent as the current block i. As soon as there is only one solitary codeword j left, the 
decoder considers it to be the true value j for that block and the block is marked as successfully decoded. 
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Rate (in packets per slot) 



Fig. 4. The impact of list decoding is significant only at low rates. The top curve is the uncertainty-focusing bound. The dotted segment is 
the part that is not achieved. The red curve represents what is achievable using Theorem |2.2| and the thick black curve represents Theorem |2.1| 
The discontinuity in the thick curve indicates when it is worth switching between the two theorems. The forward packet size C/ = 4 and 
the probability of erasure is 0.25 on both the forward and feedback links. 



When the current received packet Y(t) is incompatible with this solitary codeword (ie — 1 ^ Y(t) ^ Xi(j, t)), 
then the current block count i is incremented at the decoder and it considers the next block to have begun. 

5) The feedback encoder always uses its one bit to send back the modulo 2 number of the last block (usually 
i — 1, but sometimes i when the current block has been decoded and the receiver is still waiting for a sign 
that the next block has begun) that was successfully decoded. 

6) If the forward encoder receives feedback indicating that the current block has been successfully decoded, it 
removes the current block from the queue, increments the current i pointer, and moves on to the next block. 
If there are no blocks awaiting transmission, the encoder can just continue extending the current codeword 
until there is something new to send. 

An interesting feature of this scheme is that there are no explicit sequence numbers on the forward packets unlike 
the approach of [26]. Instead, they are implicit. This prevents a loss of rate. Synchronization between the forward 
encoder side and decoder side is maintained because: 

• They start out synchronized. 

• The forward encoder can only increment its pointer after getting explicit feedback from the decoder telling it 
that the block has been correctly received. Because the feedback channel is an erasure channel, this implies 
that such an acknowledgement was actually sent. 

• The decoder can only increment its pointer i after receiving a symbol that is incompatible with the prior 
codeword. Because the forward channel is an erasure channel, the only way this can happen is if the packet 
was indeed sent from a new codebook indicating unambiguously that the forward encoder has incremented its 
pointer. 
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Fig. 5. The upper curve is the uncertainty-focusing bound with perfect and free feedback and the lower-most curve is the delay performance 
attained by feedback-only error-correction strategies that split the channel equally among forward and feedback uses. The intermediate curves 
reflect giving everything to the forward channel and various ratios of forward to feedback channel uses. The envelope of such schemes 
(described in Theorem 12. 3t is also plotted. 

It is thus clear that no errors are ever made. The total delay experienced by a bit can be broken down into three 
parts: 

1) Assembly delay: How long it takes before the rest of the message block has arrived at the forward encoder. 
(Bounded by the constant nckf forward channel uses and hence asymptotically irrelevant as d — ► oo.) 

2) Queuing delay: How long the message block must wait before it begins to be transmitted. 

3) Transmission delay: How many channel uses it takes before the codeword is correctly decoded (Random 
quantity T[ that must be an integer multiple of kt.) 

To understand this, a closer look at the transmission delay T[ is required. First, the T[ can be upper-bounded by 
the service time T\ that measures how long it takes till the forward encoder is sure that the codeword was correctly 
decoded. This puts us in the setting of Corollary 6.1 of [4]. 

1) The service time: T; t = + T^j + T^j consists of the sum of how long it takes to complete three distinct 
stages of service. 

• Ti j: How long till the decoder realizes that the forward encoder has moved on. 

Since the new codeword's symbol Xi(j,t) is drawn independently from the previous codeword's symbol 
Xi-i(jprev,t), the probability of a received packet being ambiguous is just (3r + (1 — (3j)2~ Cs since there is 
a (3f probability of being erased and only a 1 in 2 Cf chance of drawing something identical. Thus: 

V(T 14 >t) = (p f + (l-p f )2- c n l 

= exp(tln(/5 / + (1-^)2"°')) 
= eM-tE (C f ,l)) 
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Fig. 6. The upper curve is the uncertainty-focusing bound with perfect and free feedback. The second curve down optimizes the split 
between forward and feedback channel uses and counts end-to-end delay in terms of total channel uses. Rate is calculated only relative to 
the forward channel uses with the idea that the feedback packets are carrying other useful data. The third curve is the one from Figure [5] 
that counts both delay and rate relative to total channel uses. The final curve is for forward error-correction only. 



and so T\i has a geometric distribution governed by the exponent Eo(Cf, 1). (Alternatively, this can be seen 
directly from the interpretation of Eq(1) as the exponent governing the pairwise probability of confusion for 
codewords for the regular union bound [30].) 

The T\ _i for different values of i are clearly iid since the codebook is independently drawn at each time t and 
the channel is memoryless. 

• Tz^i'. How long until the decoder is able to decode the codeword uniquely. 

Lemma 7.1 of [4] applies to this term without list decoding. The T 2: i are thus also iid across blocks i and 

V{T 2 ,i - \t(p,R,n)]ck f > tckf) < exp (-tck f E (C f , p)) (13) 

for all p € [0,1] and where t(p,R,n) = an d C(p) = ^g^y after adjusting for the units of Cf bit 

packets used here for rate R. 

• How long until the encoder realizes that the decoder has moved on. 

The only way the encoder could miss this is if all the feedback packets were erased since the decoding 
succeeded. The only subtlety comes in measuring time. In keeping with tradition, in this section time is 
measured in forward channel uses and thus kf and kb are needed to translate. 

V(T 3 ,i>tk f ) = (P b ) tkb 

= exp(-tfc 6 (-ln/3 6 )) 

and so the relevant exponent for T3 is — ^ In /3b. 
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T^i depends on the randomness in the feedback channel while and T2 j j both depend on the forward channel 
and random codebooks, but at disjoint times. Thus, each of the terms are independent of each other since they 
depend on different independent random variables. 

It is clear that the sum of three independent geometric random variables can be bounded by the slowest of them 
plus a constant. If two of them are equally slow, then the resulting polynomial growth term can be absorbed into a 
slightly smaller exponent. Thus for all e > 0, 3K depending on e and the triple (— j*- In Eo(Cf,p), Eo(Cf, 1)) 
so that: 

V(Ti - \t(p,R,n)]ck f -K > tck f ) < exp(-tckf(mm(E (C f ,p),--^-ki/3 b ) - e)) (14) 

since Eo(Ct,p) < Eo(Cf,l) because p G [0,1] and the Gallager function E$ is monotonically increasing in p. 
Notice also that the constant K here does not depend on n or c. 

2) Finishing the proof: The conditions of Corollary 6.1 of [4] now apply with point messages arriving every 
nckf channel uses, or every n units of time if time is measured in increments of ckf channel uses. At this point, 
the proof proceeds in a manner identical to that of Theorem 3.4 in [4]. As long as p is chosen small enough so 
that R < E 3j C {rl l 2 ' there exists n large enough so that t(p,R,n) + < (1 — 6)n. The effective rate R" from 

11 

Corollary 6.1 of [4] is thus — ~ — ^— — -g- < j- point messages per ckf forward channel uses. This can be made 

arbitrarily small by making n large and so by Theorem 3.3 in [4] the error exponent with end-to-end delay can be 
made arbitrarily close to min(£'o(C/, p), — ^ In fib)- If the target error exponent a satisfies (OQ), then the minimum 

is known to be E (Cf,p). This is maximized by increasing p so that approaches R from above. 

This demonstrates the asymptotic achievability of all reliability/rate points within the region obtained by varying 
[0,1]: 

kh 

a < min(-— In f3 b , E (C f , p)), 

« < E -m 

Observe that pCj appears together in the expression for Eq in (0 and so pCf plays the role of simple p for the 
binary erasure channel. Theorem 3.3 in [4] then gives the desired expression for the performance after converting 
from base 2 to base e. 

The anytime property is inherited from Corollary 6.1 of [4]. □ 



B. Theorem \2.2\ with list decoding 

When the forward and reverse channels have packet sizes of at least 2, it is possible to augment the protocol 
to use list-decoding to a list of size I and some interaction to resolve the list ambiguity. The idea is to have the 
feedback encoder tell the forward encoder a subset of bit positions in the message block that it is confused about. 
For any pair of distinct messages, there exists a single bit position that would resolve the ambiguity between them. 
Since there are I messages on the list, such bit positions are clearly sufficient. Once the forward encoder 

knows which bit positions the decoder is uncertain about, it can communicate those particular bit values reliably 
using a repetition code. 

The scheme is an extension of the scheme of Theorem 12.11 except with each message block requiring m = 
1 + ^ 2 (1 + R°§2 nckfRCf] ) rounds of communication instead of just 1 round. To support these multiple 
rounds, 1 bit is reserved on every forward packet and every feedback packet to carry the round number modulo 2. 

These rounds have the following roles: 

• 1 round: (Forward link leads, feedback link follows) A random codebook Xi(j,t) as in the previous section 
is used by the forward encoder as before to communicate most of the information in the message block. The 
round stops when decoder has decoded to within I possible choices of the codeword j. At that point, the 
feedback encoder will increment the round count in the feedback packets and initiate the next round. 

• e ^2^ l~l°g2 nckfRCf] rounds: (Feedback link leads, forward link follows) The feedback encoder uses a 
repetition code to communicate different bit positions within the block. Since there are nckfRCf 
bits within a block, it takes [log 2 nckfRCf] bits to specify a specific bit position. 
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The feedback encoder uses the second bit in each 2-bit packet to carry the repetition code encoding these 
positions. As soon as the feedback channel is successful, the forward encoder will signal the round to advance 
by incrementing its counter. If this was the last such round, the forward encoder will initiate the next type of 
round. Otherwise, as soon as a forward channel is successful, the feedback encoder will also increment the 
round and move on to the next bit. 

• roun ds: (Forward link leads, feedback link follows) The forward encoder uses a repetition code to 
communicate the specific values of the requested bits. The rounds advance exactly as in the previous 
set of rounds. 

Synchronization between the encoder and decoder is maintained because: 

• They start out synchronized. 

• The follower advances its counter as soon as it has decoded the round. As soon as the leader hears this, it 
too moves on to the next round. Because each packet comes with an unmistakable counter, it is interpreted 
correctly. 

Because the channels are erasure channels, there is no possibility for confusion. Each follower advances to the 
next round only when it has learned what it needs from this one. 

The proof that this achieves the desired error exponents is mostly parallel to that of the previous section. In the 
interests of brevity, only the differences are discussed here. 

ele ~ 1} flog nckfRCA He ~ 1) 

1) The service time: Tj = Ti yi + Ylk=i 2 ^2,i,k + Z)fc=i ^3,i,fc si nce eacri round needs to complete 

for the entire block to complete. 

• Tiji How long until the decoder is able to decode the codeword to within a list of size I. This is almost the 
same as T2 t % in the previous section. The only difference is that the effective forward packet size is Cf — 1 
bits since 1 bit is reserved for the round number modulo 2. 

Lemma 7.1 of [4] applies to this term with a list size of I. The are thus iid across blocks % and 

V(T hi - (t(p,R,n)]ck f > tck f ) < exp(-tck f E (C f - l,p)) (16) 

for all p € [0,i] and where t(p,R,n) = -p^rrn and C(p) = ^y^ST^ after adjusting for the notation used 
here including the units of Cf bit packets for rate R. 

• ?2,i,fc : How long it takes to complete one round of communicating a single bit from the feedback encoder to 
the forward encoder. This is the sum of two independent geometric random variables: one counting how long 
till a successful use of the feedback channel carrying the bit, and a second counting how long till a successful 
use of the forward channel carrying the confirmation that the bit was received. 

• T^i k'. How long it takes to complete one round of communicating a single bit from the forward encoder. This 
is also the sum of the same two independent geometric random variables. 

Use Tf(k) to denote independent geometric (in increments of kf) random variables counting how long it takes 
till a successful use of the forward channel. Similarly, use T\,{k) for the backward channel in increments of kb. 
Thus, T = T\ + ^k=i{Tf(k) + ^Tb(k)) has the distribution of the service time in terms of forward channel uses. 

Clearly Eq(Cj — l,p) < —ln(3f for all p > 0. There are two possibilities depending on whether T\ provides the 
dominant error exponent: E§{Cf — 1, p*) < —P-lnPb for the p* that solves C(p*) = R. If the feedback channel 
provides the dominant exponent, set p < p* so that Eq(Cj — l,p) = — ^ In fib- Otherwise, leave p < p* free for 
now. Define 7 = Eq{Cj — l,p) as the dominant exponent. 

Let T'(k) be iid geometric random variables (in increments of ckf) that are governed by the exponent 7 so 
V(T' > tckf) = exp(— tckf 7). Consider Y^k=[ l T'(k). This has a negative binomial or Pascal distribution. 

Lemma 3.1: Let T'(k) be iid geometric random variables that are governed by the exponent 7 so V(T' > t) = 
exp(-tj). Then, for every e' > 0, there exists an e > that depends only on 7 and e' so that Vt > 0: 

2m-l 

V( T '( k ) >t + i)< exp(- 7 (l - e')(t + 1)) (17) 
k=i 

where i = 

Proof: See Appendix J] 
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This means that the service time Tj has a complementary CDF that is bounded by: 

P(Ti- (\t(p,R,n)] +t)ck f >tck f ) < exp(-tck f (l - e')E (C f - 1, p)). (18) 

2) Finishing the proof: The proof can continue almost the same way as in the previous section. All that needs 
to be checked is that for a given target error exponent (1 — e')Eo(Cf — 1, p), the overhead \t(p, R, n)~\ + i can be 
made smaller than n so that the point message rate R" in Corollary 6.1 of [4] can be made to go to zero. 

Assuming that i > 2 (otherwise what is the point of using list-decoding!): 

\t(p,R,n)]+t < 1 + ^ — n + i !— 5i J - J — 

C(p) e 

^-lKS + log^i^CV) log 2 n 

< — — ( n. 

C(p) e n 

Clearly whenever C(p) > R, there exists an n big enough so that the entire term in brackets [• • • ] < 1 — 5 for 
some small S > 0. From this point on, the proof proceeds exactly as before. Recall that e' is arbitrary so this gets 
us asymptotically to the fixed-delay reliability region parametrized as: 

kh 

a < mm(E (Cf - l,p),-— ln/3&), 

where p ranges from to I. But £ can be chosen as high as needed. Finally, the rate in (fT9l ) can be rewritten as 
R < Ct ( 7 1 ( p(q! 2 ) ■ Notice that (C/ — l)p appear together in the expression (0 for Eq{Cj — 1, p) in the place 
of the simple p for the binary erasure channel. This lets us use Theorem 3.3 in [4] to get the desired expression, 
once again doing the straightforward conversions from base 2 to base e. □ 



C. Extensions 

While packet-erasure channels were considered for concreteness of exposition, it is clear that Theorem 12 . 1 1 extends 
to any channel on the forward link for which the zero-undetected-erroi|j capacity equals the regular capacity (See 
Problem 5.32 in [31]). If the probability of undetected error is zero, then decoding proceeds by eliminating codewords 
as being impossible. That is all that is needed in this proof. In particular, the result extends immediately to packet- 
valued channels that can erase individual bits within packets according to some joint distribution rather than having 
to erase only the entire packet or nothing at all. 

Similarly on the feedback path, the proof of Theorem 12. II only requires the ability to carry a single bit message 
unambiguously in a random amount of time, where that time has a distribution that is bounded by a geometric. 
For a channel whose zero-undetected-error capacity equals the regular capacity, a random code can be used but 
with only two codewords. This gives an exponent of at least Eq(1) on the feedback path. Therefore, the arguments 
of this section have essentially already proved the following theorem showing optimality at high rates for more 
general channels: 

Theorem 3.1: Consider the (kf,kb) problem of Figures Q] and [2] with kf,kf, > 1, and forward DMC Pf and 
backward DMC P^, both with their zero-undetected-error capacities (without feedbacliQ) equal to their regular 
Shannon capacities. Suppose that k r > is the round-trip delay (measured in cycles). 

In the limit of large end-to-end delays, it is possible to asymptotically achieve all (R, a) pairs 

a < mm(-^E b (l),E f (p)j, 

R < MM (20) 
P 

3 Zero-undetected-error means that the probability of error is zero if the decoder is also allowed to refuse to decode. For capacity to be 
meaningful, the probability of such refusals must approach zero as the delay or block-length gets large. 

4 Perfect feedback increases the zero-undetected-error capacity all the way to the Shannon capacity for DMCs. A system can just use a 
one bit message at the end to tell the decoder whether or not to accept its tentative decoding [32]. 
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in an anytime fashion where Eq and Eq are the Gallager functions for the forward and reverse channels respectively 
and p G [0,1]. The rate R above is measured in nats per forward channel use as is the reliability a. 
Proof: To deal with the round-trip delay, just extend the cycle length by considering k'^ = ^kfk r ,k' b = jkbk r . 
Consider the last kfk r channel uses of each extended cycle to be wasted. The effective number of forward channel 
uses is thus reduced by a factor (1 — e). Since this factor reduction can be made as small we like, asymptotically, 
the problem reduces to the case with no round-trip-delay. 

To patch the proof of Theorem 12. 11 to account for the general channels on the forward and feedback links: 

• Use the E^(p*) optimizing input distribution q(p*) for the random forward channel codebooks. p* € [0, 1] is 
chosen so that the target a < E^(p*) and R < E °^*\ 

• Use the Eq(1) optimizing input distribution for the random two-codeword codebooks on the feedback channel. 

• The analysis of T\ i is unchanged since (1, > E^(p*) by the properties of the Gallager function 
[30]. 

• The analysis of T% y i is entirely unchanged and gives E^{p*) as the relevant exponent. 

• The analysis of T 3 j is now exactly parallel to Tn since this also succeeds the instant the two random codewords 
on the feedback link can be distinguished by the received symbol. This is governed by the exponent Eq(1) in 
terms of feedback channel uses and thus ^£$(1) in terms of forward channel uses. 

Everything else proceeds identically, except with rate units in nats per forward channel use rather than in normalized 
units of Cf bits. □ 

It is unclear how to extend Theorem 12.21 to these general erasure-style channels. To break the Eq(1) barrier, the 
construction in Theorem 12.21 relies on having a single header bit that always shows up. This approach does extend 
to the packet-truncation channels of [33] by making the header bit come first, but it does not extend to erasure-style 
channels in which individual bits within a packet can be erased in some arbitrary fashion. 

The restriction to channels whose zero-undetected-error capacity without feedback equals their Shannon capacity 
is quite strict, and is required for the above schemes to work. This allows us to use imperfect feedback since the 
decoder can be counted on to know when to stop on its own. However, the approach of Section IIII-BI can be used 
to extend Theorem 3.4 of [4] when the feedback is perfect. Recall that Theorem 3.4 of [4] requires a zero-error 
capacity that is strictly greater than zero. The multi-round approach with repetition codes allows us to drop this 
condition and merely require that the zero-undetected-error capacity without feedback be strictly greater than zero 
(ie the channel matrix P has at least one zero entry in a row and column that is not identically zero). 

Theorem 3.2: For any DMC whose transition matrix P contains a nontrivial zero, it is possible to use noiseless 
perfect feedback and randomized encoders to asymptotically approach all delay exponents within the region 

a < mm (E (p),E*), (21) 

R < *M 
P 

where Eo(p) is the Gallager function, p ranges from to oo, and E* is the error exponent governing the zero- 
undetected-error transmission of a single bit message. 

Furthermore, the delay exponents a can be achieved in a delay-universal or "anytime" sense even if the feedback 
is delayed by an amount 4> that is small relative to the asymptotically large target end-to-end delay. 
Proof: The proof and overall approach is almost identical to what has done previously. Only the relevant differences 
will be covered here. 

First, it is well known that the presence of a nontrivial zero makes the zero-undetected-error capacity strictly 
positive. To review, let x,x' be input letters so that there exists a y such V(Y = y\X = x) = while V(Y = 
y\X = x') > 0. Then the following two mini-codewords can be used: (x,x') and (x',x). The probability of 
unambiguous decoding is at least V(Y = y\X = x') and so by using this as a repetition code, the error exponent 
E* > ^ m (l ~~ V(Y = y\X = x')) > per forward channel use. Of course, it can be much higher depending on 
the specific channel. 

The (n, c, I) scheme of [4] is modified as follows: 

• Each chunk of c channel uses is now itself implemented as variable length. It consists of a fixed c/ channel 
uses that are used exactly as before to carry a part of a random codeword. To this is appended a variable- 
length code that uses perfect feedback to communicate exactly 1 bit without error. This bit consists of the 
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"punctuation" telling the decoder whether or not there is a comma after this chunk (ie whether the decoder's 
tentative list-decoding contains the true codeword or not). 

• If the list size 1 = 2 ^1, then the / bits of list-disambiguation information are conveyed by using I successive 
single-bit variable-length codes before the next block begins. 

Perfect noiseless feedback is assumed as in [4] so that the forward encoder knows when to stop each round and 
move on. No headers are required. The main difference from Theorem 12.21 is that the number of rounds required 
to communicate a message block is not fixed. Instead, each message takes a variable number of rounds. 

The idea is to choose c — cj large and then to pick an effective c/ that is so big that it is almost proportional to 
c. Let Ti be the total service time for the z-th message block as measured in forward channel uses. It is clear that 
this is the sum of 

• 7i,« c / : The number of forward channel uses required for the random codeword before the message can be 
correctly list-decoded to within i possibilities. This is exactly as before and is governed by Lemma 7.1 of [4]. 

• J2k=\ ^2,1 k- The number of forward channel uses required to communicate the distinct punctuation 
symbols in the block. Each individually is governed by the exponent E*. 

This is the qualitatively new term. 

• Ylk=i ^3,i,fc : The number of forward channel uses required to communicate all of the I distinct 1-bit disam- 
biguation messages. These are also governed by the exponent E*. 

It is easiest to upper-bound the complementary CDF for T\^Cf + ^fc=i^2,i,fc + Sfc=i ^3,i,fe together. Define 
' ;,t(p,R,n) = \ 7 £-n\. 



P(Ti,iCf + T 2,i,fc + T 3,i,k - t{p, R, n)c > tc) 



fc=l fc=l 



<a V{T lti Cf - t(p, R, n)c f > tc f ) 

t / t{p,R,n)+t-s+l 

+ J2(P(Ti,iCf -t(p,R,n)c f = (t - s)c f )) \v{ Yl T 2)i>k >t(p,R,n)(c- c f ) + (t - s)(c- c f ) + sc) 

s=l \ k=l 

t ( t(p,R,n)+t-s+l 

< b exp(-tc f E (p)) + Y,exp(-(t-s)c f E (p))iv( ^ T 2 , i>k > {t(p, R, n) + 1 - s)(c - c/) + sc) 

s=l \ k=l 

t / t(p,R,n)+t-s+l ~ _ 

= eM-tc f E (p)) + J2^M-(t-s)c f E (p))[v( Yl T ^> — j - + sc--)) 

s=l \ k=l e 6 

< c exp(-tc f E (p)) + Y^M-(t-s)c f E (p)) ( exp(-(l - e / )( ^ R,n) ± - ~ - + - + sc - 

s=l \ 

= d exp(-tc f E (p)) + ( exp(-(l - e') ^ ^ n) + - E*) ) ^ exp(-(t - s)c f E (p)) exp(-(l - e')sc f E* 




2exp (-tc/ mm(E (p),(l - e')E*)) 
2exp f-ic(^) mm(E (p), (1 - e')E*)) • 



s=l 



=/ 

where (a) is a union bound over different ways that the budget of t(p, R, n)c + tc channel uses could be exceeded 
together with the independence of the different component service times. Notice that all the terms governed by E* 
are folded in together, (b) comes from simple algebra together with applying Lemma 7.1 of [4] and is valid as long 
as p < £. (c) is the result of substituting in the definition of e and then applying Lemma 13.11 (d) brings out the 
scf term in the second term and then (e) reflects that the sum of exponentials is dominated by the largest term. 
(/) is a simple renormalization to c units so that the result is plug-in compatible with Lemma 7.1 of [4]. 
Choosing c/ large enough tells us that for all e" > 0, there exist c, c/ large enough so that: 

V(Tj - t(p, R, n)c > tc) < 2[exp (-c(l - e") mm(E (p) , E*))] f (22) 
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as long as p € [0,£]. From this point onward, the proof is identical to the original in [4]. □ 

IV. Splitting a shared resource between the forward and feedback channel: Theorem I2.3I 

The goal of this section is to prove Theorem 12.31 by considering what the best choices for kf and kb are if both 
the rate and delay are considered relative to the sum kf + kb rather than just forward channel uses alone. 

A. Evaluating the previous schemes 

Assume kf and kb are fixed and let rjf = k *^ fc and 775 = fc k K . The r]f acts as the conversion factor mapping 
both the error exponents and the rates from per-forward-channel-use units to per-total-channel-uses units. Similarly, 
let = k c*+k b c b an d £fc = k f Cf+k b c b ■ ^ ne £/ * s tne conversion factor that maps error exponents and rates to 
per-weighted-total-channel-uses units. 

Thus for Theorem 12. 1[ (fl"5T ) becomes 



ol 



< mm(-i]blnf3b,Vf E o(Cf,p)) , (23) 

a < min (~ ^+%c / '" ft -^ £ ° (C ^ ) )' (24) 



For Theorem 12.21 the range of p expands to p S [0, 00) but the rate terms change to 

rj f E (Cf-l,p) 
pC f ln2 ' 



(27) 



5 < <*> 
pC f In 2 



5. Optimizing by adjusting kf and kb 

If — % In (5b > T]fEo(Cf,p)), then it is the forward link that is the bottleneck. If the inequality is in the opposite 
direction, then the feedback link is what is limiting reliability. This suggests that setting the two exponents equal 

-In ft, 
E {Cj,p)-\s3.p b ' 



to each other gives a good exponent a'. Since rjb = 1 — rjf, this means rff = E( Tg-Jpj^ ft ' ^ u §§^ n § this ^ n revea l s 



that all a' are achievable that satisfy: 



(-lnP b )E (Cf,p) 

E (Cf,p)-lnP b { } 

E (Cf,p)-ln(3 b t 



Plugging in for R' reveals that all the R' that satisfy 



R! < vWf'P) (32) 



/9C/ln2 
E' Q (C f ,p) 



(33) 



pC/ In 2 

are also achievable. This establishes © and identical arguments give d8]). 

To see what happens when Cf gets large while C& stays constant, just notice that in such a case £/ = k f Cf+k b c b 
gets close to 1 no matter how big fc& is. This establishes the a', R tradeoff in ©. 

Alternatively, kb can be chosen to be large enough so that — kb Q b b +k f c In @b > ~ £ m /^/ > £/^b(C/>p)- Then 
taking C/ — > 00 immediately gives the desired a, R tradeoff. □ 
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Theorem 12.31 can clearly be extended to the general setting of Theorem 13.11 The relevant E' (p) is immediately 
seen to be 

Kip) = (((^o(l))" 1 + {Hip))' 1 )' ■ ( 34 ) 

The parallel to Theorem 3.5 of [4] is obvious and makes sense since both involve splitting a shared resource to two 
purposes that must be balanced. Here, it is channel uses across the feedback and forward channels. In Theorem 3.5 
of [4], it is allocating forward channel uses to carrying messages and flow control information. 

V. Conclusions 

It has been shown that in the limit of large end-to-end delays, perfect feedback performance is attainable by 
using appropriate random codes at high rate for erasure channels even if the feedback channel is an unreliable 
erasure channel. Somewhat surprisingly, this does not require any explicit header bits on the packets if the rate is 
high enough and thus works even for a system with a BEC in the forward link and a BEC in the feedback link. The 
reliability gains from using feedback are so large that they persist even when each feedback channel use comes at 
the cost of not being able to use the forward channel (half-duplex). This was shown by considering both rate and 
delay in terms of total channel uses rather than just the forward channel uses. 

The arguments here readily generalize to all channels for which the zero-undetected-error capacity equals the 
regular capacity, but do not extend to channels like the BSC. Even when the zero-undetected-error capacity is strictly 
larger than zero, the techniques here just give an improved result for the case of perfect feedback. Showing that 
the gains from feedback are robust to unreliable feedback in such cases remains an open problem. In addition, the 
results here are on the achievability side. The best upper-bounds to reliability in the fixed end-to-end delay context 
are still those from [4] and it remains an open problem to tighten the bounds when the feedback is unreliable or 
in the half-duplex situation. 

Appendix I 
Pascal distribution bound: Lemma [3~T1 

Consider Y^k=[ l T\k) where the T'{k) are iid geometric random variables with exponent 7. This has a negative 
binomial or Pascal distribution. The probability distribution of this sum is easily bounded by interpreting the Pascal 
distribution as the (2m — 1) — th arrival time of a virtual {Zf.} Bernoulli process with probability of failure 
exp(— jckf). Pick an e > 0. 

It is clear that 

2m- 1 

V(J2T'(k)-t>t) = 

k=l 



< 



< 

But when e is small, 

D(l- e ||exp(- 7 )) = (1 - 

> (1- 
= (1- 



t+t 



V(Y,Z k <2m-l) 



k=i 



ETA z k ^ 2m-l 



t + t 



< 



t + t 



v( Ek=i Z k < 2m 



t + t 

ST^t+t y 

V( ^ k=1 h < e) 
v t + t ' 

(t + t)exp (-(t + i)D(l 



exp(-7))) . 



e)(ln(l-e)+ 7 ) + e(ln( 
e)(7 - 2e) + elne 



1 



1 - exp(-7) 



) + lne) 



e) 7 -e(ln- + 2(l-e)). 
e 
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So: 

2m-l 

V(^2 T'(k) -i>t) < (t + t)exp(-(t + i)D(l -e||exp(- 7 ))) 



fc=i 

< exp( _(i + t)[(l - e) 7 - £ (ln | + 2(1- 6)) - ^^2]) 

< exp(_(t + _ e ) 7 _ e (ln I + 2 ) - i^]) 

, , . r , ,1 ln(2m-2)+ln± 
= exp(-(t + t)[(l-e)7-e(ln- + 2+ 1 <-)]) 

< exp(-(t + i)[(l-e)7-e(21n- + 3)]) 

e 

As e — > 0, the term e(21ni + 3) also vanishes. So for any e' > 0, we can choose e small enough so that 
(1 - e)7 - e(21n \ + 3) > (1 - e') 7 . This gives: 

2m-l 

7>( >T T'(A;) - i > t) < exp(-(t + t)[(l-e)7-e(21n- + 3)]) 

k=l 

< exp(-(t + f)(l-e / )7) 

This completes the proof of Lemma 13.11 □ 
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